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Abstract 

In  this  paper,  we  will  assess  the  performance  of  a 
data-driven  anomaly  detection  algorithm,  the  In¬ 
ductive  Monitoring  System  (IMS),  which  can  be 
used  to  detect  simulated  Thrust  Vector  Control 
(TVC)  system  failures.  However,  the  ability  of 
IMS  to  detect  these  failures  in  a  true  operational 
setting  may  be  related  to  the  realistic  nature  of 
how  they  are  simulated.  As  such,  we  will  investi¬ 
gate  both  a  low  fidelity  and  high  fidelity  approach 
to  simulating  such  failures,  with  the  latter  based 
upon  the  underlying  physics.  Furthermore,  the 
ability  of  IMS  to  detect  anomalies  that  were  pre¬ 
viously  unknown  and  not  previously  simulated 
will  be  studied  in  earnest,  as  well  as  apparent  de¬ 
ficiencies  or  misapplications  that  result  from  us¬ 
ing  the  data-driven  paradigm.  Our  conclusions 
indicate  that  robust  detection  performance  of  sim¬ 
ulated  failures  using  IMS  is  not  appreciably  af¬ 
fected  by  the  use  of  a  high  fidelity  simulation. 
However,  we  have  found  that  the  inclusion  of  a 
data-driven  algorithm  such  as  IMS  into  a  suite  of 
deployable  health  management  technologies  does 
add  significant  value. 

1  INTRODUCTION 

In  preparation  for  the  launch  of  Ares  I-X,  a  data-driven 
anomaly  detection  algorithm  was  deployed  as  part  of  a 
suite  of  several  software  tools  for  inclusion  in  a  ground 
diagnostics  prototype  to  support  detection  and  diagno¬ 
sis  of  potential  anomalies  or  failures  during  the  pre¬ 
launch  phase.  The  selected  data-driven  anomaly  detec¬ 
tion  algorithm,  IMS  (Inductive  Monitoring  System),  is 
based  on  incremental  clustering,  and  operates  with  a 
semi-supervised  anomaly  detection  paradigm,  as  de¬ 
fined  in  previous  work  (Chandola  et  al.,  2009).  This 
implies  complete  reliance  on  training  data  of  only  the 
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nominal  class.  As  such  the  training  data  is  implic¬ 
itly  labeled,  and  there  are  no  labels  for  the  anomalous 
class.  The  clustering  is  performed  in  an  unsupervised 
manner,  and  any  monitored  data  points  falling  outside 
of  the  clusters  are  flagged  as  anomalous.  Detailed  de¬ 
scriptions  of  how  IMS  performs  anomaly  detection  are 
provided  in  previous  work  (Iverson,  2004),  (Iverson  et 
al.,  2009),  (Martin,  2010).  However,  a  thorough  de¬ 
scription  of  IMS  will  be  incorporated  in  a  subsequent 
section  in  this  paper  to  provide  adequate  background 
for  those  readers  unfamiliar  with  this  technology. 

Due  to  the  lack  of  available  nominal  and  fault  data 
with  which  to  validate  and  test  the  algorithm  for  Ares 
I-X,  data  from  the  Thrust  Vector  Control  (TVC)  Sys¬ 
tem  from  previous  Space  Shuttle  missions  was  used. 
The  data  collected  served  two  purposes:  as  nominal 
data,  and  fault  data  which  was  constructed  by  seed¬ 
ing  nominal  data  with  failures  of  various  types,  sever¬ 
ity,  and  fidelity  for  subsequent  validation  and  testing. 
However,  the  ability  of  IMS  to  detect  true  failures  may 
possibly  be  influenced  by  the  realism  of  how  they  are 
simulated  and  subsequently  tested.  As  such,  a  signif¬ 
icant  portion  of  this  paper  will  be  dedicated  to  inves¬ 
tigating  a  computationally  efficient  approach  to  sim¬ 
ulating  such  failures,  and  observing  the  effect  of  the 
increased  fidelity  on  detection  performance,  extending 
what  was  presented  in  previous  work  (Martin  et  al., 
2010). 

IMS  was  one  of  several  data-driven  anomaly  de¬ 
tection  tools  that  were  evaluated  for  inclusion  as 
part  of  the  suite  of  technologies  to  be  demonstrated 
during  the  Ares  I-X  test  launch,  which  included 
both  model-based  and  rule-based  technologies.  Data- 
driven  algorithms  are  just  one  of  three  different  types 
of  algorithms  that  were  deployed,  the  details  of 
which  were  presented  in  previous  work  (Iverson  et 
al.,  2009),  (Schwabacher  and  Waterman,  2008)  and 
(Schwabacher  et  al.,  2010a).  The  other  two  types 
of  algorithms  that  were  deployed  include  a  “rule- 
based”  expert  system,  and  a  “model-based”  system. 
In  the  context  of  this  particular  deployment,  dis¬ 
tinctions  among  the  use  of  the  terms  “data-driven,” 
“rule-based,”  and  “model-based,”  can  be  found  in  the 
previously  cited  paper  (Schwabacher  and  Waterman, 
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2008).  Within  the  “rule-based”  and  “model-based” 
categories,  the  deployable  candidates  were  selected 
based  upon  their  flight  heritage  and  system  certifiabil- 
ity. 

For  the  rule-based  system,  SHINE  (Spacecraft 
Health  Inference  Engine)  (James  and  Atkinson,  1990) 
was  selected  for  deployment,  which  is  used  within 
two  components  of  BEAM  (Beacon-based  Exception 
Analysis  for  Multimissions)  (Mackey  et  al.,  2001). 
Other  components  of  BEAM  include  various  data- 
driven  algorithms.  BEAM  is  a  patented  technology 
developed  at  NASA’s  JPL  (Jet  Propulsion  Laboratory). 
SHINE  serves  to  aid  in  the  management  and  identifi¬ 
cation  of  operational  modes.  For  the  “model-based” 
system,  a  commercially  available  package  developed 
by  QSI  (Qualtech  Systems,  Inc.),  TEAMS  (Testability 
Engineering  and  Maintenance  System)  was  selected 
for  deployment  to  aid  in  diagnosis.  TEAMS  was  high¬ 
lighted  in  work  subsequent  to  its  debut  (Cavanaugh, 
2001).  In  the  final  deployed  software  package,  we  in¬ 
tegrated  TEAMS  with  IMS  in  the  Ares  I-X  Ground 
Diagnostics  Prototype  (GDP)  by  running  the  two  in 
parallel  and  displaying  the  outputs  of  both  tools  on 
the  same  console,  and  we  used  SHINE  to  provide  the 
inputs  to  TEAMS.  SHINE  is  therefore  used  as  a  pre¬ 
processor  for  TEAMS,  and  as  such  it  is  not  possible  to 
directly  compare  the  performance  of  SHINE  with  that 
of  TEAMS  and  IMS  since  it  is  not  used  to  generate 
diagnostic  information.  As  will  be  shown  in  Sec.  5, 
we  will  compare  results  generated  by  IMS  with  the 
SHINE/TEAMS  combination. 

In  this  effort,  it  is  of  great  importance  to  provide  for 
a  robust  and  accurate  detection  of  a  variety  of  known 
fault  modes  that  span  a  number  of  different  rates  of 
progression  and  severity.  However,  this  capability  is 
already  well-provided  for  by  other  model-based  tools 
(i.e.  TEAMS)  within  the  suite  of  deployed  tools.  IMS 
should  be  able  to  detect  these,  as  well  as  unknown 
faults  or  anomalies  that  otherwise  may  have  not  been 
modeled  from  a  top-down,  or  data-driven  perspective, 
rather  than  a  bottom-up,  or  model-based  perspective. 
A  review  of  the  resulting  performance  of  the  entire  de¬ 
ployed  package  has  also  been  provided  (Schwabacher 
et  al.,  2010a),  (Schwabacher  et  al.,  2010b).  Other  re¬ 
lated  work  covering  similar  topics  is  also  available  in 
the  literature  (Iverson  et  al.,  2009),  (Park  et  al.,  2002), 
(Pisanich  et  al.,  2006),  (Rao  et  al.,  2009). 

Some  advantages  that  IMS  has  over  the  model- 
based  and  rule-based  algorithms  include  the  fact  that: 

1.  It  has  the  ability  to  detect  anomalies  that  were 
previously  unknown  and  not  previously  simulated 
or  accounted  for. 

2.  It  has  the  potential  to  detect  anomalies  that  are 
precursors  of  faults  before  a  model-based  system 
detects  the  fault. 

3.  It  does  not  require  a  labor-intensive  modeling 
process. 

The  disadvantages  of  IMS  compared  with  model- 
based  tools  are: 

1.  It  performs  anomaly  detection  only,  not  diagno¬ 
sis,  so  that  additional  analysis  is  necessary  to  de¬ 
termine  whether  a  detected  anomaly  is  significant 
or  not. 


2.  It  only  provides  an  acceptable  level  of  accuracy  if 
it  is  trained  using  a  sufficient  quantity  of  historical 
and/or  simulated  training  data. 

In  previous  work  (Martin,  2010),  we  studied  three 
candidates  to  provide  the  primary  role  of  data-driven 
anomaly  detection,  which  included  IMS.  Of  the  three 
algorithms  tested,  it  was  found  that  IMS  was  the  best 
performing  algorithm  when  considering  both  over¬ 
all  accuracy  as  quantified  by  the  area  under  the  Re¬ 
ceiver  Operating  Characteristic  (ROC)  curve  (AUC), 
and  computational  complexity.  In  this  paper  we  aim  to 
follow  up  with  more  detail  on  the  performance  of  IMS 
in  its  designated  primary  role  as  specified  above,  ex¬ 
ploring  both  its  advantages  and  disadvantages.  In  do¬ 
ing  so,  we  will  demonstrate  that  the  other  model-based 
and  rule-based  technologies  with  which  IMS  was  de¬ 
ployed  provided  certain  capabilities  which  IMS  com¬ 
plemented  well  in  some  cases,  while  in  other  cases, 
the  performance  of  IMS  was  less  than  desirable  due  to 
inappropriate  use. 

The  remainder  of  this  paper  will  be  organized  as  fol¬ 
lows:  Section  2  will  provide  a  detailed  description  of 
all  simulated  failures  to  be  tested,  including  the  higher 
fidelity  version  based  upon  physics.  Section  3  provides 
a  comparative  discussion  of  the  performance  of  IMS 
as  it  relates  to  the  ability  to  robustly  detecting  simu¬ 
lated  failures  of  varying  fidelities.  Section  4  will  pro¬ 
vide  a  general  discussion  of  the  selection  of  IMS  as 
the  data-driven  anomaly  detection  algorithm,  selection 
of  parameters,  training,  validation  and  testing  proce¬ 
dures.  Both  quantitative  and  qualitative  performance 
results  for  Shuttle  and  Ares  I-X  data  at  the  pad  and  at 
the  Vehicle  Assembly  Building  (VAB)  will  also  be  dis¬ 
cussed.  Section  5  provides  a  comparative  discussion  of 
the  performance  of  IMS  and  the  SHINE/TEAMS  com¬ 
bination.  The  final  concluding  section  will  provide  an 
overall  summary  and  epilogue. 

2  SIMULATED  FAILURES 

Historical  Space  Shuttle  data  was  used  to  test  the  en¬ 
tire  Ares  I-X  ground  diagnostic  prototype.  The  Space 
Shuttle  Solid  Rocket  Booster  (SRB)  TVC  is  virtually 
identical  to  the  Ares  I-X  first-stage  TVC,  so  the  SRB 
TVC  data  was  expected  to  be  very  similar  to  the  Ares 
I-X  TVC  data.  Similarly,  the  ground  hydraulic  sys¬ 
tem  used  with  the  SRB  TVC  is  virtually  identical  to 
the  ground  hydraulic  system  used  with  the  Ares  I-X 
TVC.  These  assumptions  held  up  modestly  well  after 
our  post-flight  analysis,  in  consideration  of  all  the  tools 
that  were  deployed  to  support  failure  and  anomaly  de¬ 
tection.  The  differences  that  we  found  in  the  data  were 
caused  by  differences  in  operations  between  Shuttle 
and  Ares  I-X,  rather  than  by  differences  in  the  TVC 
or  HSS  (Hydraulic  Support  System)  hardware. 

The  SRB  TVC  and  the  associated  ground  hydraulic 
system  have  had  very  few  failures.  We  thus  had  avail¬ 
able  to  us  an  abundance  of  nominal  data,  but  very  little 
failure  data.  We  therefore  decided  to  develop  a  set  of 
failure  simulations  that  could  be  used  to  test  the  ability 
of  the  prototype  to  detect  and  diagnose  failures.  We 
inserted  simulated  failures  into  the  historical  Shuttle 
data,  and  used  the  resulting  data  sets  to  test  the  proto¬ 
type  before  the  Ares  I-X  launch. 
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Table  1 :  Failure  Mode  Summary 


Failure  Mode  Label 

Vehicle  Location 

Failure  Mode 

la 

Pad 

FSM  Leak 

VAB 

2 

VAB 

HPU  overheat 

3 

Pad 

Hydraulic  Leak 

VAB 

4 

VAB 

Stuck  actuator 

Table  1  provides  a  summary  of  the  failure  modes 
that  we  simulated  for  each  vehicle  location.  In  order 
to  test  the  integration  of  the  TVC  and  HSS  TEAMS 
models,  we  decided  to  select  one  failure  mode  that 
can  be  isolated  to  the  TVC  (Failure  Mode  la,  FSM 
Leak)1,  one  that  can  be  isolated  to  the  HSS  (Failure 
Mode  2,  HPU  overheat)2,  and  one  that  would  produce 
a  TEAMS  ambiguity  group  including  both  TVC  and 
HSS  candidates  (Failure  Mode  3,  Hydraulic  Leak).3 
In  addition,  because  the  actuator  positioning  test  was 
considered  to  be  the  most  important  pre-launch  test  of 
the  TVC,  we  decided  to  simulate  a  failure  during  this 
test  (Failure  Mode  4,  Stuck  actuator).4  We  will  only 
describe  failure  mode  la  in  the  remainder  of  this  sec¬ 
tion,  as  it  includes  examples  of  simulations  that  span 
the  range  of  fidelity  used  for  all  of  the  failure  modes. 
For  the  remaining  failure  modes,  low  fidelity  linear 
simulations  were  used  and  simulated  in  a  similar  fash¬ 
ion  as  the  low  fidelity  version  of  failure  mode  la.  Fur¬ 
thermore,  although  the  motivation  for  selecting  these 
specific  failure  modes  was  based  upon  support  for  test¬ 
ing  and  integration  of  TEAMS  models,  they  also  serve 
as  proving  grounds  for  testing  the  anomaly  detection 
capability  of  IMS. 

As  shown  in  Table  1,  a  leak  in  the  fuel  supply  mod¬ 
ule  can  be  simulated  either  at  the  pad  or  at  the  VAB. 
The  leak  at  the  pad  was  simulated  to  occur  between  Go 
for  GLS  (Ground  Launch  Sequencer)  Start  (at  approx¬ 
imately  T-31  sec)  and  Go  for  SSME  (Space  Shuttle 
Main  Engine)  Start  (at  approximately  T-10  sec).  The 
FSM  pressure  is  simulated  to  drop  to  an  off-nominal 
value  instead  of  nominally  staying  above  a  specified 
threshold. 

Similar  to  the  other  simulated  failure  scenarios,  an 
initial  attempt  at  the  construction  of  the  FSM  failure 
simulation  involved  the  simple  use  of  a  linearly  de¬ 
creasing  ramp,  given  a  predefined  rate  of  degradation 
from  the  nominal  operating  pressure  to  an  off-nominal 


*The  Fuel  Supply  Module  (FSM)  leak  is  a  N0H4  (hy¬ 
drazine)  leak  resulting  in  a  pressure  drop,  and  is  simulated 
within  1  minute  prior  to  launch  at  the  pad  and  within  the  34 
minute  period  after  the  calibration  test  in  the  VAB. 

2The  Hydraulic  Pumping  Unit  (HPU)  overheat  failure  is 
an  over-temperature  failure  simulated  within  a  25  minute  pe¬ 
riod  during  tests  in  the  VAB. 

3  A  hydraulic  fluid  leak  will  result  in  a  hydraulic  fluid 
reservoir  level  drop  that  is  simulated  within  1  minute  prior 
to  launch  at  the  pad  and  within  the  10  minute  period  after  the 
calibration  test  in  VAB. 

4The  actuator  is  simulated  to  be  stuck  during  the  actuator 
positioning  test  during  a  2.5  minute  test  in  VAB. 


value.  This  linear  simulation  was  used  to  support  the 
ROC  analysis  performed  in  a  previous  study  (Martin, 
2010).  However,  it  is  possible  to  use  a  higher  fidelity 
physics-based  simulation  for  this  scenario  because  all 
of  the  relevant  data  is  available  for  its  construction.  A 
higher  fidelity  failure  scenario  may  provide  a  more  re¬ 
alistic  test  of  our  algorithm’s  ability  to  detect  the  fail¬ 
ure  in  reality.  The  method  used  for  the  same  simu¬ 
lated  failure  occurring  at  the  VAB  spans  the  period  of 
time  during  which  APU  (Auxiliary  Power  Unit)  sys¬ 
tem  checks  are  conducted.  Both  low  fidelity  (linearly 
decreasing  ramps)  and  high  fidelity  (physics-based) 
failure  simulations  for  the  FSM  leak  will  be  used  for 
analysis  of  data  at  both  the  pad  and  the  VAB  to  offer 
a  fair  basis  for  comparison  in  how  fidelity  affects  fi¬ 
nal  performance.  This  is  primarily  due  to  the  fact  that 
differences  in  detection  performance  between  the  VAB 
and  pad  may  be  due  to  differences  in  operational  pro¬ 
cedures  regardless  of  simulation  fidelity. 

The  FSM  pressure  will  begin  dropping  from  a  nom¬ 
inal  value  to  venting  at  atmospheric  pressure  over  the 
course  of  a  few  minutes.  As  the  FSM  pressure  drops, 
the  FSM  pressure  sensor  will  redline  on  a  low  value. 
To  simulate  this  failure,  we  must  account  for  both  fluid 
phases  contained  in  the  FSM,  the  liquid  hydrazine  and 
the  gaseous  nitrogen  used  to  pressurize  the  spherical 
tank,  such  that  it  is  completely  voided.  The  leak  in 
the  FSM  will  be  simulated  to  evolve  according  to  the 
following  assumptions: 

1.  Assume  that  the  geometry  of  the  FSM  is  estab¬ 
lished  according  to  available  documentation. 

2.  Assume  that  the  liquid  hydrazine  ( N2H4 )  is  filled 
only  to  midpoint  of  the  spherical  tank. 

3.  Assume  that  the  leak  is  below  the  surface  of  the 
liquid. 

In  order  to  simulate  the  FSM  leak  according  to 
physics,  we  will  also  implicitly  use  all  of  the  as¬ 
sumptions  that  result  from  applying  the  unsteady  form 
of  Bernoulli’s  equation  as  presented  in  (Munson  et 
al.,  1998)  to  solve  the  differential  equation  shown  as 
Eqn.  1  associated  with  the  initial  leak  of  the  liquid  hy¬ 
drazine.  Fig.  1  depicts  the  leak  along  with  some  of  the 
geometrical  constants  and  subscripted  reference  points 
used  in  Eqn.  1 . 


+  g{h  1 


(i) 


Pg=Po  ~  Pa  is  the  gage  pressure  in  the  tank,  where  po 
is  the  pressure  to  which  the  tank  is  pressurized  with 
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GN-2,  and  pa  is  atmospheric  pressure,  p  is  are  the  den¬ 
sity  of  liquid  hydrazine,  and  g  represents  the  gravita¬ 
tional  constant.  Cd  is  the  coefficient  of  discharge  at 
the  leak  point,  and  s  defines  the  fluid  streamline  along 
which  Bernoulli’s  equation  is  being  applied.  V\  and 
hi  define  the  velocity  and  height  from  the  ground  to 
the  top  of  the  liquid  hydrazine,  respectively.  Similarly, 
V‘2  and  fi2  define  the  exit  velocity  and  height  from  the 
ground  to  the  site  of  the  leak,  respectively. 

We  assume  the  sphere  has  radius  r,  and  the  cross- 
sectional  disk  representing  the  top  surface  of  the  liquid 
hydrazine  shown  in  Fig.  1  has  a  radius  of  d.  Since  we 
are  interested  in  unanticipated  decreases  in  the  height 

of  the  liquid  hydrazine  in  the  tank,  let  us  define  h=h± 
as  our  independent  variable  to  simplify  Eqn.  1  for  the 
one-dimensional  case,  defined  with  respect  to  the  ref¬ 
erence  +z  shown  in  Fig.  1.  Furthermore,  we  may  ap¬ 
ply  Eqn.  2  for  conservation  of  mass,  and  Eqn.  3  defines 
the  velocity  v\  as  a  function  of  the  height  hi.  The  ideal 
gas  laws  Eqns.  11-12  are  defined  for  constant  tempera¬ 
ture  (de)pressurizaton,  and  we  assume  constant  accel¬ 
eration  via  Eqn.  4.  The  geometry  defined  in  Fig.  1  and 
auxiliary  Eqns.  5-10  involve  hg,  the  distance  from  the 
ground  to  the  bottom  of  the  tank,  and  hr,  the  distance 
from  the  top  surface  of  the  liquid  hydrazine  in  the  tank 
to  the  top  of  the  tank.  Thus,  the  simplified  version 
of  Eqn.  1  results  in  the  differential  equation  shown  as 
Eqn.  13. 
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A2  represent  the  surface  areas  of  the  N2HJGN2  fluid 
interface  and  the  round  hole  through  which  liquid  hy¬ 
drazine  is  leaking,  respectively.  Vq  and  V„  are  the  ini¬ 
tial  volume  of  GN2  and  the  volume  of  GN2  as  the  leak 
evolves,  respectively.  Finally,  m  represents  the  mass 
flow  rate  of  the  liquid  hydrazine  (N>H\),  and  mg  rep¬ 
resents  the  total  mass  of  the  GN>  in  the  tank. 

An  approximation  to  the  resulting  differential  equa¬ 
tion  can  be  used  to  yield  a  separable  nonlinear  differen¬ 
tial  equation  that  can  be  solved  in  closed  form,  shown 
as  Eqn.  14.  This  approximation  is  applied  by  recog¬ 
nizing  that  the  left  hand  side  of  Eqn.  1 3  (quantifying 
the  gravitational  and  acceleration  terms)  is  negligible 
relative  to  the  right  hand  side.  The  gravitational  term 
is  always  negligible,  and  the  acceleration  term  is  im¬ 
portant  only  for  quantification  of  a  negligibly  small 
transient  at  the  very  beginning  of  the  leak.  Further¬ 
more,  constants  characterizing  the  FSM  geometry  can 
be  simplified  due  to  the  relative  sizes  of  the  leak  radius 
and  the  radius  of  the  N2HJGN2  fluid  interface  (i.e. 
do  <C  d(h)).  The  last  assumption  is  that  pa  <C  pit), 
which  may  contribute  most  to  the  approximation  error 
since  the  tank  pressure  evolves  over  time  and  will  not 
necessarily  always  be  much  greater  than  atmospheric 
pressure.  Thus  the  error  may  potentially  grow  over 
time  as  the  tank  pressure  decreases  due  to  evolution  of 
the  leak.  However,  in  general  the  resulting  closed-form 
representation  will  help  to  relieve  the  computational 
burden  associated  with  numerical  methods  otherwise 
required  to  solve  the  differential  equation  (i.e.  a  stiff 
solver). 


dh  CdA2  1 2p0V0 

dt  “  Ai(h)  \j  pVg(h)  (  ; 

Note  that  the  negative  square  root  of  (^|)2  must 
be  used  in  Eqn.  14  in  order  to  yield  a  real  solution. 
Furthermore,  by  recognizing  that  dX^1'1  =  —A\{h), 
Eqn.  14  can  be  simplified  to  Eqn.  15. 


yhdViL 

9  dt 
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Integrating  both  sides  of  Eqn.  15  and  combining  the 
result  with  Eqn.  12,  we  may  now  write  the  resulting 
closed-form  expression  for  the  tank  pressure  as  a  func¬ 
tion  of  time,  p(t),  shown  as  Eqn.  16. 
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where  do  represents  the  radius  of  the  leak  area  assumed 
to  be  a  round  hole,  and  R  represents  the  ideal  gas  con¬ 
stant  for  GN2.  p(t)  and  T  represent  the  absolute  pres¬ 
sure  as  the  leak  evolves  as  a  function  of  time,  and  ab¬ 
solute  temperature  of  the  GN2,  respectively.  Ai  and 


Simulation  of  the  voiding  of  the  remaining  gaseous 
nitrogen  (GN2 )  in  the  FSM  is  performed  by  use  of  a 
linear  Is*  order  approximation  of  a  differential  equa¬ 
tion  governing  the  release  of  an  ideal  gas  as  used  in 
(Tchouvelev  et  al.,  2007).  The  solution  of  the  dif¬ 
ferential  equation  is  shown  as  Eqn.  17.  The  mass  of 
the  GN2  was  obtained  by  use  of  the  design  condition 
(1.1  lbs  of  gaseous  nitrogen  at  400  psig  as  a  baseline), 
obtained  from  the  seminal  paper  on  the  introduction 
of  the  FSM  (McCool  et  al.,  1980).  It  was  also  as¬ 
sumed  that  the  GN2  underwent  a  constant  temperature 
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and  constant  volume  ideal  depressurization  (bleeding 
off  tank  pressure  by  operating  a  GN2  pressurization 
valve)  from  this  design  condition  to  the  nominal  value 
that  existed  at  the  time  the  leak  was  simulated  when 
in  the  VAB.  The  constant  temperature  assumption  also 
holds  for  evolution  of  the  leak  from  the  nominal  pres¬ 
sure  value  to  p(t)  =  pa. 


P(t) 


pve 


7+1 

7-1  RT 


(17) 


Of  course,  when  h  =  h 2,  the  liquid  hydrazine  will 
have  emptied  out  to  the  point  that  it  can  no  longer  es¬ 
cape  from  the  hole,  and  only  the  gaseous  nitrogen  is 
left  to  escape.  We  call  the  pressure  at  which  this  occurs 
the  vent  pressure,  pv,  which  can  easily  be  computed 
using  Eqns.  7,  10,  and  12.  The  time  of  this  event  can 
be  approximated  by  using  Eqn.  16.  The  corresponding 
volume  of  gas  left  to  be  evacuated  from  the  tank  is  Vv, 
and  7  is  the  ratio  of  specific  heats  for  GN-2.  Therefore, 
Eqn.  16  governs  the  release  of  liquid  hydrazine  until 
the  time  of  the  vent  pressure.  At  this  point,  Eqn.  17 
governs  the  subsequent  release  of  gaseous  nitrogen  and 
complete  voiding  of  the  tank  at  which  point  p(t)  =  pa. 


3  COMPARATIVE  ANALYSIS 

In  this  section  we  aim  to  investigate  and  observe  the  ef¬ 
fect  of  increased  simulation  fidelity  on  detection  per¬ 
formance.  In  doing  so  we  hope  to  gain  a  better  un¬ 
derstanding  for  and  develop  an  appreciation  of  possi¬ 
ble  improved  ability  of  IMS  to  detect  simulated  fail¬ 
ures  that  may  be  more  realistic.  In  the  previous  sec¬ 
tion,  we  have  provided  the  details  for  how  a  high  fi¬ 
delity,  physics-based  simulation  of  a  fuel  supply  mod¬ 
ule  leak  is  to  evolve,  according  to  Eqns.  16  and  17. 
Using  these  equations,  the  time  at  which  the  pressure 
in  the  FSM  approximately  reaches  atmospheric  pres¬ 
sure  associated  with  the  high  fidelity  simulation  can 
be  used  to  constmct  the  slope  of  the  line  associated 
with  the  low  fidelity  simulation,  for  a  fixed  leak  ra¬ 
dius.  Thus,  implicit  linearized  versions  of  Eqns.  16 
and  17  represent  low  fidelity  simulations.  The  slope 
of  the  resulting  line  will  determine  the  rate  of  degra¬ 
dation,  to  be  used  as  a  fair  basis  for  comparison  to  the 
nonlinear  rate  of  degradation  which  evolves  according 
to  physics.  Fig.  2  illustrates  the  times  to  be  used  to 
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Figure  2:  Time  to  FSM  Voiding  for  Various  Leak  Radii 


construct  the  slopes  of  low  fidelity  linear  simulations, 
based  upon  various  leak  radii  that  were  simulated  with 
the  high  fidelity  physics  based  simulations. 

The  detection  performance  can  be  quantified  by  the 
Area  under  the  ROC  (Receiver  Operating  Characteris¬ 
tic)  curve  (AUC).  The  ROC  curve  is  a  plot  of  the  true 
positive  rate  against  the  false  positive  rate,  and  can 
be  used  to  help  make  the  tradeoff  between  these  two 
rates.  The  curve  is  constructed  by  treating  time  points 
as  representative  samples,  all  of  which  are  implicitly 
used  to  compute  the  tme  and  false  positive  rates.  The 
AUC  is  loosely  a  measure  of  accuracy  over  all  possible 
tradeoffs  between  the  true  positive  rate  and  the  false 
positive  rate,  computed  by  numerically  integrating  the 
area  under  the  ROC  curve.  More  formally,  the  AUC 
represents  the  probability  that  a  randomly  chosen  fail¬ 
ure  data  point  is  more  suspect  than  a  randomly  chosen 
nominal  data  point  (Rosset,  2004).  An  AUC  of  one 
thus  indicates  perfect  ranking  of  these  two  randomly 
selected  data  points. 

As  such.  Fig.  3  demonstrates  how  detection  perfor¬ 
mance  varies  across  a  range  of  leak  radii  for  both  the 
high  and  low-fidelity  simulations  of  FSM  leaks,  using 
Shuttle  data  at  the  pad  as  the  sole  exemplar.  Detection 
performance  using  Shuttle  data  from  the  VAB  is  poorer 
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Pad  AUC  vs  Leak  Radius 


Figure  3:  High  vs.  Low  Fidelity  Simulation  Detection 
Performance 


than  that  at  the  pad  due  to  reasons  to  be  described  in 
Sec.  4.  These  reasons  are  also  specific  to  the  FSM  leak 
failure  mode  la,  but  otherwise  performance  using  the 
VAB  Shuttle  data  exhibit  the  same  tendencies  as  per¬ 
formance  based  upon  using  data  from  the  pad. 

Two  main  observations  can  be  made  regarding  Fig. 
3.  First,  it  is  evident  that  robust  detection  performance 
improves  as  the  leak  radius  increases,  as  quantified  by 
the  AUC,  regardless  of  the  simulation  fidelity.  This 
meets  with  intuition,  since  a  faster  leak  should  be  more 
easily  and  quickly  detectable.  The  second  observa¬ 
tion  relates  to  the  fact  that  the  detection  performance 
of  both  the  high  and  low  fidelity  simulations  converge 
as  the  leak  radius  increases.  This  also  meets  with  in¬ 
tuition,  since  a  faster  leak  can  be  more  easily  approxi¬ 
mated  in  a  linear  fashion.  However,  as  we  can  tell  from 
the  error  bars,  there  is  quite  a  bit  of  overlap  between 
the  high  and  low  fidelity  simulation  methods  for  fast 
leaks,  and  even  for  slow  leaks.  Thus,  there  is  no  appre¬ 
ciable  difference  between  the  detection  performance 
results  for  the  low  and  high  fidelity  simulations,  and  as 
such  we  will  not  make  the  distinction  between  the  two 
for  the  remainder  of  the  paper. 

4  ANOMALY  DETECTION 

As  mentioned  previously,  IMS  works  under  the  princi¬ 
ple  of  semi-supervised  anomaly  detection  by  building 
a  model  of  the  nominal  historical  data  on  which  it  is 
trained.  It  has  also  been  described  as  a  distance-based 
anomaly  detection  tool  that  uses  a  data  driven  tech¬ 
nique  called  incremental  clustering  to  extract  models 
of  normal  system  operation  from  archived  data  (Iver¬ 
son,  2004).  We  will  now  provide  a  thorough  discussion 
of  IMS,  which  was  originally  presented  in  a  previous 
paper  (Iverson  et  al. ,  2009),  but  will  be  repeated  here 
to  provide  comprehensive  treatment  of  the  algorithm. 

Because  IMS  only  models  the  nominal  data,  and 
does  not  model  any  failure  modes,  it  can  potentially 
detect  unknown  failure  modes.  IMS  works  with  vec¬ 
tors  of  data  values,  which  it  collects  for  periods  of  nor¬ 


mal  system  operation  during  the  learning  process  to 
build  a  system  model.  It  characterizes  how  the  param¬ 
eters  relate  to  one  another  during  normal  operation  by 
finding  areas  in  the  vector  space  where  nominal  data 
tends  to  fall.  These  areas  are  called  nominal  operating 
regions  and  correspond  to  clusters  of  nearby,  similar 
points  found  by  the  IMS  clustering  algorithm.  IMS 
represents  these  nominal  operating  regions  as  hyper¬ 
boxes  in  the  vector  space,  providing  a  minimum  and 
maximum  value  limit  for  each  parameter  of  a  vector 
contained  in  a  particular  hyper-box.  These  hyper-box 
cluster  specifications  are  stored  in  a  knowledge  base 
(KB)  that  IMS  uses  for  real-time  telemetry  monitoring 
or  archived  data  analysis.  Figure  4  shows  an  overview 
of  the  IMS  method. 

4.1  IMS  Learning  Process 

In  general,  the  number  and  extent  of  nominal  operat¬ 
ing  regions  created  during  the  IMS  learning  process  is 
determined  by  three  learning  parameters:  the  “max¬ 
imum  interpretation”  (max  interp)  parameter  can  be 
used  to  adjust  the  size  and  number  of  clusters  derived 
from  a  fixed  number  of  training  data  points,  the  “initial 
cluster  size”  is  used  to  adjust  the  tolerance  of  newly 
created  nominal  operating  regions,  and  the  “cluster 
growth  percent”  is  used  to  adjust  the  percent  increase 
in  size  of  a  nominal  operating  region  when  incorporat¬ 
ing  new  training  data  vectors.  More  specifically,  the 
learning  algorithm  builds  a  knowledge  base  of  clusters 
from  successively  processed  vectors  of  training  data. 
As  such,  the  clustering  approach  is  incremental  in  na¬ 
ture,  which  distinguishes  it  from  well-known  methods 
such  as  k-means  clustering  where  the  resulting  clusters 
are  independent  of  the  ordering  of  the  vectors.  With 
the  processing  of  each  new  training  data  vector,  the 
Euclidean  distance  from  this  new  vector  to  the  cen¬ 
troid  of  the  nearest  cluster  in  the  knowledge  base  is 
computed.  If  this  distance  is  below  a  pre-specified 
“max  interp”  value,  the  new  vector  is  summarily  in¬ 
corporated  into  that  cluster.  The  upper  or  lower  lim¬ 
its  for  each  affected  dimension  of  the  cluster  are  ex¬ 
panded  respectively  according  to  the  “cluster  growth 
percent”  parameter  to  reflect  the  inclusion  of  the  new 
vector.  This  incremental,  inductive  process  gives  IMS 
an  advantage  over  other  clustering  methods  such  as  k- 
means,  since  it  tends  to  group  temporally  related  points 
during  the  learning  process.  The  grouping  of  tempo¬ 
rally  related  points  may  also  aid  in  discovering  distinct 
system  operations,  which  makes  IMS  more  amenable 
to  the  specific  goal  of  monitoring  time  series  data  for 
system  operations. 

The  “cluster  growth  percent”  parameter  is  used  to 
adjust  the  learning  rate.  It  establishes  a  fixed  “growth” 
percentage  difference  for  expansion  of  each  dimen¬ 
sion  when  updating  previously  formed  clusters.  This 
“cluster  growth  percent”  learning  parameter  is  there¬ 
fore  clearly  proportional  to  the  learning  rate,  due  to 
the  increased  number  of  training  data  points  that  will 
be  assigned  to  each  new  cluster  per  iteration  for  higher 
values  of  the  “cluster  growth  percent”  parameter.  Nat¬ 
urally,  the  number  of  clusters  in  the  knowledge  base 
for  a  given  training  data  set  will  increase  as  the  “max 
interp”  value  and  “cluster  growth  percent”  values  are 
decreased.  Therefore,  an  inverse  relationship  between 
the  “max  interp”  value  and  the  number  of  clusters  in 
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Figure  4:  Inductive  Monitoring  System  (IMS)  overview 


the  knowledge  base  exists.  This  dependence  can  be 
exploited  to  regulate  the  final  size  of  the  knowledge 
base  in  order  to  accommodate  resource  limitations  in 
the  computers  running  IMS. 

If  the  distance  between  a  newly  processed  vector 
and  the  centroid  of  the  nearest  cluster  in  the  knowledge 
base  is  above  the  pre-specified  “max  interp”  value,  a 
new  cluster  is  created.  The  formation  of  a  new  cluster 
is  accomplished  by  creating  a  hyper-box  whose  dimen¬ 
sions  are  based  upon  forming  a  window  around  each 
element  of  the  new  training  data  vector.  The  window  is 
defined  by  introducing  the  “initial  cluster  size”  param¬ 
eter  which  is  used  to  adjust  the  learning  tolerance.  This 
“initial  cluster  size”  learning  parameter  represents  a 
fixed  percentage  of  the  value  for  each  dimension  of  the 
new  training  vector.  As  such,  it  relates  directly  to  the 
size  of  newly  established  clusters,  otherwise  known  as 
the  “learning  tolerance.”  The  “initial  cluster  size”  and 
“cluster  growth  percent”  learning  parameters  also  act 
as  buffers  which  enable  a  provisional  allowance  for 
manufacturing  sensor  tolerances  and  for  sensors  that 
may  have  suffered  from  deterioration  due  to  wear.  Fur¬ 
thermore,  these  learning  parameters  provide  increased 
coverage  to  compensate  for  training  data  that  may  not 
fully  characterize  the  nominal  performance  envelope. 

4.2  IMS  Monitoring  Process  and  Data  Analysis 

During  the  monitoring  operation,  IMS  reads  and  nor¬ 
malizes  real-time  or  archived  data  values,  formats 
them  into  the  predefined  vector  structure,  and  searches 
the  knowledge  base  of  nominal  operating  regions  to 
see  how  well  the  new  data  vector  fits  the  nominal  sys¬ 
tem  characterization.  After  each  search,  IMS  returns 
the  Euclidean  distance  from  the  new  vector  to  the  near¬ 
est  nominal  operating  region,  called  the  composite  dis¬ 
tance.  Data  that  matches  the  normal  training  data  well 


will  have  a  composite  distance  of  zero.  If  one  or  more 
of  the  data  parameters  is  slightly  outside  of  expected 
values,  a  small  non-zero  result  is  returned.  As  in¬ 
coming  data  deviates  further  from  the  normal  system 
data,  indicating  a  possible  malfunction,  IMS  will  re¬ 
turn  a  higher  composite  distance  value  to  alert  users  to 
the  anomaly.  IMS  also  calculates  the  contribution  of 
each  individual  parameter  to  the  composite  deviation, 
which  can  help  identify  and  isolate  the  cause  of  the 
anomaly. 

The  “distance  from  nominal”  output  of  IMS  will  be 
the  scores  mentioned  earlier:  a  composite  score  for  the 
set  of  parameters  as  a  whole  and  a  separate  score  for 
each  individual  parameter.  Additionally,  to  facilitate 
out  of  bounds  conditions,  IMS  will  output  an  alarm 
when  scores  exceed  a  specified  threshold.  There  are  a 
number  of  methods  to  determine  the  threshold.  The¬ 
oretically,  if  there  were  a  comprehensive  training  data 
set  available,  IMS  could  leam  the  high  and  low  thresh¬ 
olds  of  each  individual  parameter  under  every  operat¬ 
ing  condition,  so  any  deviation  from  the  IMS  model 
would  warrant  an  alert.  However,  as  a  practical  mea¬ 
sure,  the  simplest  and  most  often  used  method  in  pre¬ 
vious  deployments  has  been  for  the  user  to  specify  a 
threshold  value  based  heuristically  upon  the  statistics 
of  available  validation  data.  For  example,  twice  the 
standard  deviation  of  the  composite  score  applied  to 
validation  data  may  serve  as  an  alert  level,  or  higher 
multiples  of  this  value,  depending  on  the  skewness  and 
kurtosis  of  the  underlying  distribution  of  the  compos¬ 
ite  score.  However,  due  to  the  fact  that  the  distribution 
will  invariably  change  as  a  function  of  the  data  pro¬ 
vided,  the  alert  thresholds  may  vary  drastically  from 
one  dataset  to  another.  As  such,  an  alternative  method 
can  be  used  which  has  gained  traction  and  is  quickly 
becoming  the  standard  for  assessing  performance  of 
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classification  algorithms  within  the  machine  learning 
community  and  more  recently  the  aerospace  ISHM  do¬ 
main.  This  method  involves  the  use  of  ROC  (Receiver 
Operating  Characteristic)  curve  analysis,  and  the  area 
under  the  ROC  curve  (AUC),  which  was  previously  in¬ 
troduced  to  support  the  comparative  analysis  presented 
in  Sec.  3.  As  such,  one  of  the  tasks  of  the  Ares  I-X 
Ground  Based  Diagnostics  (GBD)  project  was  to  em¬ 
ploy  this  method  for  threshold  selection,  which  was 
applied  in  previous  work  (Martin,  2010),  although  a 
more  heuristic  method  was  used  for  threshold  selec¬ 
tion  in  this  paper. 

Prior  to  the  Ares  I-X  launch,  we  trained  IMS  on  his¬ 
torical  Space  Shuttle  data,  and  tested  it  using  histori¬ 
cal  Shuttle  data  into  which  we  had  inserted  simulated 
failures.  During  the  Ares  I-X  pre -launch  period,  IMS 
processed  live  Ares  I-X  data,  using  a  knowledge  base 
that  was  the  result  of  training  IMS  on  historical  Shuttle 
data.  The  remainder  of  this  section  describes  the  se¬ 
lection  of  measurements  for  use  with  IMS,  the  training 
and  testing  procedures  used,  and  the  results  obtained 
both  on  Shuttle  data  and  on  Ares  I-X  data.  The  section 
concludes  with  a  summary  of  the  results. 

4.3  Parameter  Selection 

For  Ares  I-X  data  to  be  compatible  with  historical 
Shuttle  data  a  common  set  of  parameters  needed  to  be 
chosen.  During  the  Shuttle  analysis  on  chosen  sim¬ 
ulated  faults,  all  continuous-valued  parameters  were 
selected  along  with  one  discrete  parameter  that  was 
known  to  be  critical  in  detecting  one  of  the  three  failure 
simulations,  for  a  total  of  137  parameters.  The  choice 
of  using  mostly  continuous  parameters  was  made  be¬ 
cause  historically  IMS  has  performed  better  when  op¬ 
erating  on  mostly  continuous  sensor  values.  After  run¬ 
ning  an  analysis  on  the  failure  simulations,  some  false 
alarms  were  detected  and  an  additional  set  of  param¬ 
eters  were  eliminated,  leaving  102  parameters.  For 
the  purposes  of  feature  selection  (parameter  elimina¬ 
tion),  a  false  alarm  is  defined  qualitatively  as  a  large 
excursion  above  an  apparent  “baseline”  in  the  com¬ 
posite  score  produced  by  IMS,  which  characterizes 
the  anomalousness  of  a  specific  point  in  the  time  se¬ 
ries.  With  the  elimination  of  these  parameters  the  false 
alarms  were  significantly  reduced.  When  the  first  set 
of  Ares  I-X  VAB  data  was  recorded  a  common  subset 
was  selected  between  the  Ares  I-X  parameter  set  and 
the  102  parameters  from  the  Shuttle  resulting  in  the  33 
parameters  used  for  analysis  on  the  Ares  I-X  data. 

4.4  Training  and  Testing  Procedures 

For  the  purpose  of  training  and  testing  IMS,  we  used 
historical  Space  Shuttle  data  into  which  we  inserted 
simulated  failures,  with  varying  rates  of  degradation, 
and  spanning  fixed  time  periods  in  a  random  fash¬ 
ion.  Although  the  main  purpose  of  using  IMS  in  the 
Ground  Diagnostics  Prototype  is  to  detect  unknown 
failures,  we  tested  it  by  using  simulations  of  known 
failures.  (For  obvious  reasons,  we  were  unable  to  sim¬ 
ulate  unknown  failures.)  As  has  been  previously  dis¬ 
cussed,  IMS  has  a  number  of  tunable  input  parameters, 
however  one  key  parameter  that  was  very  important  to 
tune  was  the  “max  interp”  parameter.  Because  this  pa¬ 
rameter  directly  influences  the  number  of  clusters  cre¬ 
ated  in  the  learning  phase,  it  therefore  has  a  major  in- 
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Figure  5:  AUC  as  a  function  of  IMS  Parameter  Max 
Interp  for  Shuttle  data  from  the  VAB 


fluence  in  the  final  anomaly  score  calculated  by  IMS. 
Recall  that  the  max  interp  value  increases  as  the  total 
number  of  clusters  formed  decreases. 

Use  of  the  ROC  curve  may  be  used  for  optimized  se¬ 
lection  of  IMS  parameters  (i.e.  the  number  of  clusters 
in  a  knowledge  base).  This  can  be  performed  by  using 
the  AUC,  which  we  know  loosely  translates  to  over¬ 
all  classification  discriminability.  Therefore,  by  max¬ 
imizing  the  AUC  with  respect  to  the  IMS  parameters 
that  control  the  number  of  clusters,  we  can  ensure  that 
the  knowledge  base  used  by  IMS  in  order  to  perform 
threshold  or  alert  selection  has  the  best  anomaly  de¬ 
tection  capability  possible.  To  determine  the  optimal 
max  interp  value  and  corresponding  number  of  clus¬ 
ters,  a  set  of  cross  validation  runs  was  performed  on  a 
set  of  Shuttle  VAB  and  pad  data,  using  the  AUC  as  the 
governing  metric  for  optimization.  Cross  validation  is 
a  technique  for  estimating  the  accuracy  of  a  machine 
learning  algorithm,  by  training  and  testing  the  algo¬ 
rithm  multiple  times,  each  time  using  different  subsets 
of  the  available  data  for  training  and  testing,  and  then 
averaging  the  results. 

4.5  Results  on  Shuttle  Simulations 

Once  the  cross  validation  runs  were  complete,  the  ar¬ 
eas  under  the  ROC  curves  were  calculated  using  data 
that  spans  the  time  that  the  shuttle  was  still  in  the  VAB. 
Figure  5  shows  the  maximum,  minimum,  and  average 
AUC  over  the  three-fold  cross  validations  and  three 
fault  scenarios  (listed  as  failure  modes  la,  2,  and  3  in 
Table  1)  for  each  max  interp  value.  Due  to  the  na¬ 
ture  of  this  aggregation,  we  have  tacitly  assumed  an 
equal  probability  of  occurrence  of  each  of  the  failure 
modes.3 * 5 

The  optimal  max  interp  value  that  was  chosen  is 
marked  in  Figure  5.  The  mean  AUC  with  the  high¬ 
est  value  is  0.86893,  and  corresponds  to  the  optimal 
max  interp  value  of  0.13.  The  ROC  curve  associated 
with  this  optimized  max  interp  value  can  be  seen  in 

3It  should  be  noted  that  this  assumption  was  based  upon 

convenience  in  generating  the  results,  and  not  on  empirical 

data.  It  is  quite  possible  that  the  AUC  and  ROC  curves  would 
differ  if  certain  faults  were  more  or  less  likely  than  others. 
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Figure  6:  ROC  Curve  for  Optimal  Max  Interp  for  Shut¬ 
tle  data  from  the  VAB 


Figure  6.  The  relatively  modest  detection  performance 
at  the  VAB  can  be  attributed  to  the  fact  that  IMS  had 
difficulty  detecting  simulated  failure  la.  This  difficulty 
stemmed  from  the  fact  that  the  increase  in  IMS  score 
resulting  from  this  simulated  failure  was  not  much 
larger  than  the  nominal  variation  in  the  IMS  score,  so 
it  was  not  possible  to  select  a  threshold  that  would  al¬ 
low  IMS  to  detect  all  of  the  simulated  failures  without 
increasing  the  number  of  false  alarms.  Thus,  some  fail¬ 
ure  modes  are  easily  detected  using  the  distance-based 
approach  with  IMS  clustering,  while  others  are  not. 
When  IMS  is  used  in  parallel  with  the  TEAMS/SHINE 
combination,  it  should  detect  all  of  the  failures  that  are 
modeled  with  TEAMS .  The  advantage  of  using  IMS  is 
that  it  has  the  potential  to  detect  failures  that  were  not 
modeled,  as  well  as  anomalies  that  are  not  yet  failures. 
For  the  pad,  the  AUC  is  0.99919,  indicating  that  IMS 
does  an  excellent  job  of  detecting  the  two  simulated 
failure  modes  la  and  3  at  the  pad,  and  performs  much 
better  than  at  the  VAB.  An  intuitive  explanation  for 
this  discrepancy  relates  to  the  fact  that  at  the  pad  only  a 
small  portion  of  the  data  has  high  “activity,”  during  the 
last  minute  before  launch.  However,  data  from  quies¬ 
cent  periods  previous  to  the  last  minute  before  launch 
are  also  used  for  analysis.  As  such,  this  translates  to 
a  lower  signal  to  noise  ratio,  which  directly  influences 
the  AUC,  resulting  in  a  higher  value  and  thus  fewer 
false  positives. 

4.6  Results  on  Ares  I-X 

Once  the  optimal  max  interp  parameter  was  deter¬ 
mined  from  the  Shuttle  data,  IMS  was  trained  on  33 
measurements  using  Shuttle  data  from  seven  flights, 
which  also  represents  the  greatest  common  subset  cor¬ 
responding  to  equivalent  Ares  I-X  measurements.  Af¬ 
ter  building  the  knowledge  base,  the  Ares  I-X  data  was 
evaluated  against  it,  and  ostensibly  acts  as  hold  out  test 
data  from  a  machine  learning  standpoint.  The  resulting 
IMS  scores  for  the  VAB  are  shown  in  Figure  7.  With 
the  initial  set  of  33  measurements,  3  periods  of  anoma¬ 
lous  behavior  were  flagged  by  IMS;  they  are  labeled 
as  three  “False  Alarms”  in  Figure  7.  We  performed 
an  analysis  of  each  “false  alarm”;  here  we  present  the 
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Figure  7:  Ares  I-X  VAB  Global  IMS  Score  with  False 
Alarms 


analysis  of  False  Alarm  1  as  an  example.  We  deter¬ 
mined  that  False  Alarm  1  was  primarily  caused  by  two 
measurements.  The  contributing  IMS  scores  for  these 
two  measurements  are  plotted  in  Figure  8. 

False  Alarm  1  was  caused  by  a  difference  between 
the  Space  Shuttle  and  Ares  I-X  data.  In  recent  years, 
the  TVC  actuator  tests  performed  in  the  VAB  have  all 
been  “pinned”  tests,  meaning  that  the  actuator  is  phys¬ 
ically  pinned  to  the  nozzle  during  testing,  so  that  the 
nozzle  moves  during  the  test.  The  first  TVC  actuator 
position  test  performed  in  the  VAB  for  Ares  I-X  was 
an  “unpinned”  test,  meaning  that  the  actuator  was  de¬ 
tached  from  the  nozzle,  and  the  nozzle  did  not  move 
during  the  test.  Because  the  actuator  was  unpinned, 
it  was  able  to  move  through  a  larger  range  of  motion 
that  is  not  possible  during  pinned  testing.  IMS  there¬ 
fore  saw  rock  and  tilt  position  values  that  it  had  never 
seen  in  the  Shuttle  data,  which  it  flagged  as  anomalies. 
These  anomalies  are  “false  alarms”  in  the  sense  that 
they  are  not  failures,  but  they  do  illustrate  the  ability 
of  IMS  to  detect  new  data  that  is  different  from  what  it 
has  seen  before.  We  performed  a  similar  analysis  for 
the  pad,  where  there  were  fewer  anomalies  identified 
by  IMS.  Like  the  anomalies  detected  at  the  VAB,  the 
anomalies  detected  at  the  launch  pad  were  caused  by 
operational  differences  between  Shuttle  and  Ares  I-X. 

4.7  Summary  of  IMS  Deployment  Results 

The  experiments  that  we  ran  before  the  Ares  I-X 
launch  using  historical  Space  Shuttle  data  with  sim¬ 
ulated  failures  demonstrated  that  IMS  is  able  to  de¬ 
tect  most  of  the  simulated  failures,  but  not  all  of  them. 
In  particular,  it  had  difficulty  detecting  the  simulated 
failure  mode  1  a  in  the  VAB  due  to  its  relatively  small 
contribution  to  the  overall  IMS  anomaly  score  com¬ 
pared  to  the  other  two  simulated  failure  modes.  IMS  is 
not  trained  to  detect  specific  failure  modes;  it  detects 
data  that  is  anomalous  according  to  its  cluster-based 
model.  We  expect  that  many  known  and  unknown  fail¬ 
ure  modes  will  be  detected  as  anomalies  by  IMS,  but  it 
is  not  guaranteed  to  detect  all  possible  failure  modes. 
The  advantage  of  using  IMS  together  with  a  model- 
based  diagnosis  system  such  as  TEAMS  is  that  it  adds 
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Figure  8:  VAB  Top  Contributing  IMS  Scores  For  False  Alarm  1 


the  potential  to  detect  unknown  failure  modes  and  to 
detect  precursors  of  failures. 

The  results  of  running  IMS  on  Ares  I-X  data,  us¬ 
ing  a  knowledge  base  that  was  trained  on  historical 
Space  Shuttle  data  confirm  our  hypothesis  that  the 
Ares  I-X  TVC  data  is  reasonably  similar  to  the  Space 
Shuttle  SRB  TVC  data.  Most  of  the  time  IMS  pro¬ 
duced  small  anomaly  scores  when  mn  on  the  Ares  I-X 
data.  IMS  did  detect  some  “anomalies”  in  the  Ares  I- 
X  data.  These  anomalies  were  “false  alarms”  in  the 
sense  that  they  were  not  failures  but  rather  caused  by 
operations  performed  differently  for  Ares  I-X  versus 
Shuttle;  hence,  they  illustrate  the  ability  of  IMS  to  de¬ 
tect  new  data  that  is  different  from  what  has  been  seen 
in  the  past. 

5  IMS/TEAMS  PERFORMANCE 
COMPARISON 

5.1  Anomalies  Detected  by  IMS  That  Were  Not 
Detected  by  TEAMS 

We  have  seen  that  IMS  detected  some  interest¬ 
ing  anomalies  that  were  not  detected  by  TEAMS 
because  they  were  not  failures  as  defined  in  the 
FMEA  (Failure  Modes  and  Effects  Analysis)  and  the 
other  documents  on  which  the  TEAMS  models  were 
based.  One  such  anomaly  was  the  pinned/unpinned 
actuator  anomaly  mentioned  previously.  In  the 
pinned/unpinned  anomaly  there  were  procedural  dif¬ 
ferences  between  the  TVC  test  for  the  Shuttle  and  Ares 
I-X,  resulting  in  IMS  signaling  an  anomaly  in  the  TVC 
rock  and  tilt  actuator  positions.  This  anomaly  was  not 
a  failure;  hence  it  was  detected  by  IMS  but  not  by 
TEAMS.  Furthermore,  it  was  found  that  there  are  other 
differences  between  Shuttle  and  Ares  I-X  actuator  tests 
due  to  the  sequence  being  changed  slightly  along  with 
the  increased  range  of  the  actuator.  Ostensibly,  this 
had  an  even  greater  effect  than  the  pinned/unpinned 
variation  alone  for  IMS. 

5.2  Failures  Detected  Earlier  by  IMS  Than  by 
TEAMS 

Table  2  summarizes  the  detection  times  for  the  sim¬ 
ulated  failures  that  were  detected  both  by  IMS  and 


TEAMS  in  minutes  after  injection  of  the  failure.  A 
hypothesized  advantage  of  IMS  is  that  it  may  detect 
certain  failures  before  TEAMS.  However,  on  average 
the  results  show  that  TEAMS  detected  failures  prior  to 
the  time  that  IMS  did.  On  two  occasions,  IMS  was 
able  to  detect  a  simulated  failure  prior  to  TEAMS, 
as  shown  in  red  in  Table  2.  In  the  case  of  failure  3, 
which  was  simulated  with  a  simple  bit  flip  at  the  VAB, 
the  detection  occurred  at  approximately  the  same  time. 
The  other  two  failures  are  more  complicated,  and  are 
described  by  gradual  ramps  of  continuous-valued  pa¬ 
rameters  rather  than  instantaneous  bit  flips  of  discrete¬ 
valued  parameters,  owing  to  the  notable  differences  in 
detection  times.  It  can  be  seen  from  Table  2  that  IMS 
sometimes  detected  failures  earlier  than  TEAMS  did, 
but  more  often  it  detected  them  later.  Thus,  there  may 
be  some  marginal  advantage  to  running  IMS  in  paral¬ 
lel  with  TEAMS  in  order  to  provide  earlier  detection 
of  some  failures.  Another  observation  worth  noting  is 
that  there  appears  to  be  a  wider  variance  for  the  IMS 
detection  latencies  for  a  given  failure  simulation  span¬ 
ning  several  flights.  This  lends  credence  to  the  fact  that 
TEAMS  detection  times  are  based  purely  upon  logic 
rather  than  statistics,  the  latter  of  which  IMS  incorpo¬ 
rates  in  its  detection  capability. 

5.3  Failures  Detected  By  TEAMS  That  Were  Not 
Detected  By  IMS 

IMS  occasionally  misses  simulated  failures,  usually  as 
a  function  of  the  fine  tuning  required  to  mitigate  spe¬ 
cific  instances  of  false  alarms  on  test  (Ares  I-X)  data. 
This  fine  tuning  involves  varying  the  number  of  clus¬ 
ters  in  the  knowledge  base,  the  measurements  (sensor 
values)  represented  in  the  knowledge  base,  as  well  as 
the  threshold  or  qualitative  heuristic  used  following  the 
application  of  ROC  analysis.  Typically,  ROC  curves 
span  multiple  failures,  but  are  based  only  on  a  limited 
number  of  Shuttle  flights  for  training  data.  As  such, 
when  applying  the  resulting  knowledge  base  to  unseen 
hold  out  test  ( e.g .  Ares  I-X)  data,  simulated  failures 
may  not  be  detected.  In  fact,  great  measures  may  need 
to  be  taken  in  order  for  such  failures  to  be  detected,  of¬ 
ten  at  the  expense  of  false  alarms,  as  is  apparent  in  the 
examples  of  false  alarms  presented  previously. 
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Table  2:  Summary  of  simulated  failure  detection  times  (all  shown  in  seconds) 


Failure 

Flight 

Trial 

IMS  Detection  Time 

TEAMS  Detection  Time 

Difference 

la 

STS-107 

1 

8.77 

2.24 

6.53 

2 

1931 

5704 

174747 

3 

B748 

1739 

12709 

STS-112 

1 

217.37 

2.2 

215.14 

2 

337 

5704 

U777 

3 

222.07 

1.38 

220.7 

STS-120 

1 

039 

2723 

-1.34 

2 

8.69 

5.02 

3.67 

3 

3.45 

1738 

2707 

2 

STS-112 

1 

1.76 

1.5 

0.26 

2 

TUI 

2.33 

1.68 

3 

3778 

2.4 

08 

STS-120 

1 

4792 

2.57 

2.35 

2 

3762 

2732 

1.3 

3 

3789 

2739 

1.5 

3 

STS-112 

1 

0 

0 

0 

2 

0 

0 

0 

3 

0 

0 

0 

STS-120 

I 

0 

0 

0 

2 

0 

0 

0 

3 

0 

0 

0 

5.4  Failures  More  Appropriate  For  Modeling 
With  TEAMS 

Anomaly  detection  methods  such  as  IMS  are  not  well 
suited  for  detecting  certain  types  of  failures.  As  men¬ 
tioned  previously,  we  used  simulations  of  known  fail¬ 
ure  modes  to  test  IMS.  For  some  of  these  simulated 
failures,  we  expended  a  lot  of  effort  in  tuning  IMS  to 
get  IMS  to  detect  the  simulated  failures.  This  tuning 
process  included  reducing  the  set  of  measurements  that 
were  used  to  train  IMS.  For  failure  mode  4,  a  simulated 
failure  covering  a  stuck  actuator  during  a  simulated 
positioning  test  at  the  VAB,  almost  all  measurements 
other  than  the  one  required  to  simulate  the  failure  had 
to  be  excluded  in  order  to  provide  adequate  detection 
capability.  For  this  same  case,  a  linear  regression  was 
required  in  order  to  facilitate  the  construction  of  com¬ 
manded  position  computed  by  proxy  of  a  commanded 
current  measurement  due  to  the  absence  of  the  requi¬ 
site  electromechanical  conversion  data.  The  difference 
between  the  quasi-commanded  position  and  the  actual 
measured  position  was  then  used  as  the  sole  parameter 
with  which  to  train  and  test  IMS.  Any  additional  mea¬ 
surements  included  in  the  knowledge  base  resulted  in 
a  missed  detection.  This  case  demonstrates  that  IMS 
was  clearly  not  a  good  choice  for  detecting  the  partic¬ 
ular  failure  mode. 

Cases  such  as  these  serve  as  evidence  that  each  tool 
should  be  leveraged  to  promote  its  strengths  rather 
than  re-adapting  the  tool  to  solve  a  problem  that  is 
outside  of  its  domain  of  relevance.  With  IMS,  we 
know  that  its  strengths  lie  in  a  great  potential  to  de¬ 
tect  faults  that  are  unknown  or  that  otherwise  have  not 
been  modeled  and  to  detect  anomalies  that  are  precur¬ 
sors  of  faults  before  a  model-based  system  detects  the 
fault  to  a  lesser  extent.  We  believe  that  it  would  be 
better  to  rely  on  TEAMS  to  detect  the  known  failure 
mode  described  above,  rather  than  tuning  IMS  to  de¬ 


tect  it.  Reducing  the  set  of  measurements  that  are  used 
to  train  IMS  did  allow  IMS  to  successfully  detect  the 
simulated  failures,  but  it  reduced  the  potential  of  IMS 
to  detect  other  unknown  failures. 


6  SUMMARY  AND  CONCLUSIONS 

As  mentioned  previously,  we  believe  including  a  semi- 
supervised  data-driven  anomaly  detection  algorithm 
such  as  IMS  alongside  a  model-based  diagnosis  sys¬ 
tem  such  as  TEAMS  in  a  diagnostic  system  adds  sig¬ 
nificant  value,  when  used  appropriately.  Doing  so  will 
allow  the  overall  anomaly  detection  system  to  be  en¬ 
dowed  with  the  potential  to  detect  anomalies  that  can¬ 
not  be  detected  by  the  model-based  diagnosis  system 
in  isolation,  either  because  they  are  unknown  failures 
and  therefore  unmodeled,  or  because  they  are  not  fail¬ 
ures.  Furthermore,  IMS  may  detect  known  failures  in 
advance  of  the  time  that  TEAMS  would  detect  them, 
and  in  general  IMS  requires  less  modeling  effort  than 
TEAMS  (although  it  does  require  a  sufficient  quantity 
of  historical  and/or  simulated  training  data). 

It  was  also  important  to  consider  the  ability  of  IMS 
to  detect  failures  in  a  true  operational  setting,  but  there 
was  a  dearth  of  true  failures  resembling  those  that  we 
simulated  with  which  to  conduct  experiments.  There¬ 
fore,  we  hoped  to  demonstrate  an  improved  ability  of 
IMS  to  detect  simulated  failures  that  may  be  more  re¬ 
alistic  by  increasing  their  fidelity.  We  have  found  that 
for  fast  FSM  leaks,  robust  detection  performance  im¬ 
proves  for  both  the  high  and  low  fidelity  simulations, 
and  the  performance  for  both  types  of  simulations  also 
converges  as  the  leak  rate  increases.  However,  over¬ 
all  we  have  also  observed  that  there  is  no  appreciable 
difference  between  the  effect  of  using  a  low  or  high 
fidelity  simulation  on  detection  performance. 
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